Accelerating X-Ray CT Reconstruction using SIMD and Half Precision Floating-Point on Intel Xeon Processor
نویسندگان
چکیده
Our group worked on accelerating the X-ray Computed Tomography (CT) reconstruction with statistical image reconstruction algorithm, using single-instruction multiple data (SIMD) instructions to accelerate the regularizer part. Also, we used half-width floating point data format to mitigate the memory bandwidth problem. Our results show that we could achieve 2.5x speedup by combining SIMD instructions and multi-threading compared to non-SIMD-optimized multi-threading program. We performed memory bandwidth analysis and found out that the SIMD-optimized version of the regularizer part is not memory bandwidth limited. Also, it turned out that both fixed-point representation and halfwidth floating point representation were not suitable for the regularizer due to the resultant large errors.
منابع مشابه
Exploring SIMD for Molecular Dynamics, Using Intel
We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia’s miniMD benchmark, which we demonstrate using three SIMD widths (128-, 256and 512bit). The applicability of these optimisations to wider SIMD is discu...
متن کاملAccelerating DNA Sequence Analysis using Intel Xeon Phi
Genetic information is increasing exponentially, doubling every 18 months. Analyzing this information within a reasonable amount of time requires parallel computing resources. While considerable research has addressed DNA analysis using GPUs, so far not much attention has been paid to the Intel Xeon Phi coprocessor. In this paper we present an algorithm for large-scale DNA analysis that exploit...
متن کاملCharacterization of Intel Xeon Phi for Linear Algebra Workloads
This study focuses on applicability of Intel Xeon Phi coprocessor for some of the Basic Linear Algebra Subprograms (BLAS) subroutines. Based on Many Integrated Core (MIC) architecture, the vector processing unit (VPU) in Xeon Phi coprocessor provides data parallelism at a very fine grain, working on 512 bits of 16 single-precision floats or 32-bit integers at a time. In our work we analyze how ...
متن کاملCo-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: A First Look at Knights Landing
Three dimensional particle-in-cell laser-plasma simulation is an important area of computational physics. Solving state-of-the-art problems requires large-scale simulation on a supercomputer using specialized codes. A growing demand in computational resources inspires research in improving efficiency and co-design for supercomputers based on manycore architectures. This paper presents first per...
متن کاملIntel Skylake review
The Skylake Xeon (called Xeon Scalable processor) introduced a number of innovations, notably AVX-512 vectorization instructions capable of 8-wide double precision vectors (previous AVX2 had 4-wide DP vectors). This change in itself has a potential of doubling performance of floating point codes. Other changes include CPU core optimizations, rearchitecture of the caches, and new, mesh-based top...
متن کامل